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ALMOST OPTIMAL SEQUENTIAL TESTS OF DISCRETE 
COMPOSITE HYPOTHESES 



Georgios Fellouris and Alexander G. Tartakovsky 
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Abstract: We consider the problem of sequentially testing a simple null hypothesis, Ho, 
versus a composite alternative hypothesis, Hi, that consists of a finite set of densities. 
We study sequential tests that are based on thresholding of mixture-based likelihood 
ratio statistics and weighted generalized likelihood ratio statistics. It is shown that both 
sequential tests have several asymptotic optimality properties as error probabilities go 
to zero. First, for any weights, they minimize the expected sample size within a con- 
stant term under every scenario in Hi and at least to first order under Hq. Second, for 
appropriate weights that are specified up to a prior distribution, they minimize within 
an asymptotically negligible term a weighted expected sample size in Hi. Third, for 
a particular prior distribution, they are almost minimax with respect to the expected 
Kullback-Leibler divergence until stopping. Furthermore, based on high-order asymp- 
totic expansions for the operating characteristics, we propose prior distributions that 
lead to a robust behavior. Finally, based on asymptotic analysis as well as on simula- 
tion experiments, we argue that both tests have the same performance when they are 
designed with the same weights. 

Key words and phrases: Asymptotic optimality, Generalized likelihood ratio, Minimax 
sequential tests, Mixture-based tests. 

1. Introduction 

Let {X t } ig N be a sequence of independent and identically distributed (i.i.d.) random 
vectors with values in R d , d G N = {1, 2, . . . }, and common density / with respect to 
some non-degenerate, cr-finite measure v{dx). We consider the problem of sequentially 
testing Ho : / G Aq versus Hi : / G Ax, where Aq and A\ are two disjoint sets of 
densities with common support. That is, we assume that observations are acquired in a 
sequential manner and the goal is to select the correct hypothesis as soon as possible. 
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If {Ft} is the observed filtration, i.e., Ft = a(X\, . . . ,X t ), a sequential test 
5 = (T, (It) is a pair that consists of an {J^j-stopping time, T, and an J^-measurable 
(temiinal) decision rule, dr = dr(Xi, . . . , Xt) G {0, 1}, that specifies which hypoth- 
esis is to be accepted once observations have stopped. In particular, Hj is accepted if 
dx = j, i.e., {dx = j} = {T < oo, 5 accepts H,,}, j = 0, 1. 

An ideal sequential test should have the smallest possible expected sample size 
under both Ho and Hi, while controlling its error probabilities below given tolerance 
levels. Thus, if Pj is the underlying probability measure when X\ has density / and Ef 
is the corresponding expectation, we will say that 5° = (T°, d To ) G C a $ is an optimal 
sequential test if 

E f [T°]= inf E f [T] V/eioUA, 

where C a ,p is the class of sequential tests whose maximal type-I and type-II error prob- 
abilities are bounded above by a and (3 respectively, i.e., 

C aP = \s : sup P f (d T = 1) < a and sup P f (d T = 0) < /?). 
L feAo feAx > 

Wald and Wolfowitz (1948) proved that an optimal sequential test exists when both 

hypotheses are simple, i.e., „4o = {/o} an d A\ = {/i}, and is given by the Sequential 

Probability Ratio Test (SPRT) that was proposed by Wald (1944) in his seminal work on 

Sequential Analysis: 

S = inf{£GN: k\ i (A~\B)} , d s = l {A i> B} , (1.1) 

where A, B > 1 are constant thresholds selected so that Po(ds = 1) = a and Pi(ds = 
0) = (3 and {A|} is the likelihood ratio statistic 

AHn^fi * GN - (L2) 

In the case of composite hypotheses, it has only been possible to find sequential tests 
that are optimal in an asymptotic sense. More specifically, we will say that 5° G C a ^ is 
uniformly (first-order) asymptotically optimal, if 

E f [T°]= inf Ef[T] (1 + o(l)) VfeA UA h 

as a, (3 — > 0. When, in particular, Ao and Ai can be embedded in an exponential family 
{fe, 6 G 0} and Gi is a subset of the natural parameter space G so that 9q ^ Qi and 

Ao = {f 6o } and Ai = {f ,O G Gi}, (1.3) 
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it is well known (see, for example, Lorden (1973), Pollak and Siegmund (1975)) that 
the sequential test (1.1) is uniformly asymptotically optimal if Aj is replaced either 
by the generalized likelihood-ratio (GLR) statistic, sup 06e Af , or by a mixture-based 
likelihood ratio statistic, J Q Af w(9) dO, where w(-) is some probability density function 
on (weight function) and Af is denned as in (1.2) with fi replaced by fg. However, 
apart from certain tractable cases, both these statistics are not in general recursive and, as 
a result, they cannot be easily implemented on-line. Moreover, their computation at each 
step may be approximate, since it often requires discretization of the parameter space. 
These problems can be overcome if one uses the adaptive likelihood-ratio statistic, At = 
At-i(fe*(Xt)/fo(Xt)), where 9% is an estimator of 9 that depends on the first t — 1 
observations. However, this approach, initially developed by Robbins and Siegmund 
(1970, 1974) for power one tests and later extended by Pavlov (1990) and Dragalin and 
Novikov (1999) for multihypothesis sequential tests, generally leads to less efficient 
sequential tests, since one-stage delayed estimators use less information than the global 
MLE that is employed by the GLR statistic. Sequential testing of composite hypotheses 
in a Bayesian formulation with a small cost of observations was considered by Chemoff 
(1972); Kiefer and Sacks (1963); Lai (1988); Lorden (1967); Schwarz (1962) among 
others. 

In the present paper, we consider the problem of sequential testing a simple null 
hypothesis against a discrete alternative consisting of a finite set of densities, i.e., we 
assume that 

Ao = {f } and Ai = {fi,...J K }, (1.4) 

where AT is a positive integer. This hypothesis testing problem has two main motiva- 
tions. First, it serves as an approximation to the continuous-parameter testing problem 
(1.3), in which 0i is replaced by a finite subset {9±, . . . , 9k} of 0i so that fj = fg., 
j = 0,1,..., AT. Indeed, as we mentioned above, the GLR statistic and mixture- 
based likelihood ratio statistics cannot always be easily implemented on-line and their 
computation may require discretization of the parameter space. With (1.4), we dis- 
cretize the alternative hypothesis itself. This implies a loss of efficiency under Pq when 
9 ^ {9i, ... ,9k}, but it leads to sequential tests that are easily implementable on-line, 
a very important advantage for many applications. 

Second, problem ( 1 .4) naturally applies to multisample (also known as multichan- 
nel or multisensor) slippage problems, which have a wide range of applications (see, 
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e.g., Chemoff (1972); Tartakovsky et al. (2003, 2006)). As an example, consider the 
setup in which K sensors monitor different areas, a signal may be present in at most one 
of these areas and the goal is to detect signal presence without identifying its location. 
If additionally the sensors are statistically independent and sensor i takes i.i.d. obser- 
vations {X t *} te N with density g\ (resp. gfy when signal is present (resp. absent), this 
problem turns out to be a special case of (1.4) with X t = (Xj, . . . , X^) and 

K K 

f (X t ) = Hgl(xi), f i (X t )=gi(Xi)l[gi(Xi), l<i<K. (1.5) 

j=l i=l 

For problem (1.4), we consider two sequential tests which are both parametrized 
by two vectors with positive components (weights), qj = (qj, . . . , qf), j = 0, 1 and 
they both have the following structure: "stop the first time t at which either A t > B or 
At < A~ l and select Hi in the first case and Ho in the latter", where {At} and {A 4 } are 
appropriate {J^j-adapted statistics. For the first test, which we call Mixture Likelihood 
Ratio Test (MiLRT), the corresponding statistics are given by 

K K 

A t = ^g{Af and A t = ^$,Aj; 

i=l i=l 

for the second test, which we call Weighted Generalized Likelihood Ratio Test (WGLRT), 
they are given by 

A * = ™* (2i A *) and A t = max (^Aj), 

l<i<K 1<«<A 

where K\ is the likelihood ratio defined in (1.2) with f\ replaced by /j. 

Tartakovsky et al. (2003) studied the GLRT, i.e., the WGLRT with uniform weights, 
<?o = 9i = 1' 1 — * — i n tne multichannel setup (1.5) and established its asymptotic 
optimality. More specifically, it was shown that the GLRT is second-order asymptoti- 
cally optimal, in the sense that it attains ini$ e c a B Ej[T] within an 0(1) term for every 
1 < i < K , where 0(1) is asymptotically bounded as a,/3 — > 0. Moreover, it was 
shown that, in the special case of completely asymmetric channels, the GLRT also at- 
tains inf<5 g e Q p EoPI within an O(l) term. (Here and in what follows we denote by Pj 
the underlying probability measure when X\ has density fj and by Ej the coixesponding 
expectation, j = 0, 1, . . . , K.) 

The first contribution of the present work is that this uniform, second-order as- 
ymptotic optimality property is established for both the MiLRT and the WGLRT with 
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arbitrary weights qo and qi in the more general setup of problem (1.4). However, the 
main question we want to answer is how to select these weights in order to obtain further 
"benefits". In this direction, we show that if p = (pi, . . . ,px) is an arbitrary probabil- 
ity mass function, which can be interpreted as a prior distribution on Hi, and qo, qi are 
selected so that 

q l Q = piQ and q\ = Pi/Ci, 1 < i < K, (1.6) 

then both tests attain inf,y e c a B E P [T] within an o(l) term, where E p is expectation with 
respect to the weighted probability measure P p = Yli=iPi^i an ^ tne ^-numbers {Ci}, 
formally introduced in (2.1), provide overshoot corrections that allow us to achieve this 
refined asymptotic optimality property. 

In addition, we find a prior distribution p which makes both tests almost minimax 
with respect to the expected Kullback-Leibler (KL) information (divergence) that is 
accumulated until stopping, in the sense that they attain within an o(l) term 

inf max (/,• EJTl), 

*ec a ,„ i<i<K v n ih 

where Ii is the KL-information number (see (2.2)). In this way, we generalize the corre- 
sponding result in Fellouris and Tartakovsky (2012), where this minimax problem was 
considered in the context of open-ended, mixture-based sequential tests. 

Moreover, we compare numerically the tests with this (almost) least favorable prior 
distribution with some alternative choices for p in the context of the multichannel prob- 
lem (1.5) with channels that take exponential or Gaussian observations. Based on high- 
order asymptotic expansions for the operating characteristics of both tests, we find that 
selecting pi to be proportional to Ii or Ci leads to a much more robust behavior than 
the one induced by p, especially when the channels have veiy different signal strengths. 
Finally, based on these asymptotic expansions as well as on Monte Carlo simulations, 
we argue that both the WGLRT and the MiLRT have essentially the same performance 
when they are designed with the same weights. 

The remainder of the paper is organized as follows. In Section 2, we introduce ba- 
sic notation and present some preliminary results. In Section 3, we obtain asymptotic 
approximations to the operating characteristics of the two tests, whereas in Section 4 
we establish their asymptotic optimality properties. In Section 5, we compare different 
specifications for p and in Section 6 we compare the tests using Monte Carlo simula- 
tions. We conclude in Section 7. 
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2. Notation, Assumptions and Definitions 
2.1. Elements of renewal theory 

For every 1 < i < K, we set Z\ = log Af, where K\ is given by (1.2) with f\ replaced 
by fi- We quantify the "distance" between fi and /o using the £-number 

£i = exp j- [P (4 > 0) + Pi(4 < 0)] | , (2.1) 

as well as the KL information numbers 

h = UZ\] = J log(j^hh(x)v(dx), (2.2) 

P = E [-Z{] = j log(|M)/ (x)z,(dx). (2.3) 

Without loss of generality, we assume that /i ,...,/# are ordered with respect to their 
KL divergence from fo so that 

Jo = mm =ll = .-.=r Q < F+ l < ... < I«. (2.4) 

l<l<il 

Note that r = 1 coiTesponds to the asymmetric situation in which Iq is attained by a 
unique index i = 1. On the other hand, r = K corresponds to the completely symmetric 
situation in which Iq is the same for every 1 < i < K. The latter case occurs, for 
example, in the multisample slippage problem (1.5) when g l = go and g\ = gi, 1 < i < 
K, i.e., when the densities do not depend on the population (or sensor, in a multisensor 
context). 

In order to avoid trivial cases, we assume that fi and fo do not coincide almost 
everywhere, which implies that h, T$ > for every 1 < i < K. We also assume 
throughout the paper that Z\ is non-arithmetic under Po and P, L and that Ii, Iq < oo for 
every 1 < i < K. Then, if we define the first hitting times 

t\ = M{t : Z\ > c}, al = inf{* : Z\ < -c}, c> 0, 

it is well known that the overshoots Z l i — c and \ Z l i + c| have well defined asymptotic 
distributions under Pj and Po respectively, i.e., 

Hi(x) = lim PiiZii -c<x), U {x) = lim P (|^L +c\<x), x > 0, 

c— >oo 'c c— too c 

and consequently, we can define the following Laplace transforms 

poo poo 

7*= / e- x Hi(dx), 7o= / e- x ni(dx), 
Jo Jo 
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which connect the KL-numbers with the £-numbers as follows: d = 7t ii = % Iq 
(see, e.g., Theorem 5 in Lorden (1977)). These quantities are very important, since 
they allow us to achieve with great accuracy the desired error probabilities of the SPRT, 
5 l = (S\ d S i), for testing fo against /j (that is, 8 l is given by (1.1) with Aj replaced by 
A|). Specifically, if A = %//3 and B = ji/a, then P {d S i = 1) = a(l + o(l)) and 
Pi(d S i = 0) = /3(1 + o(l)) as a, f3 -»• (see Siegmund (1975)). 

If additionally second moments are finite, Ej[(Z{) 2 ], Eo[(Z]) 2 ] < go, then %j and 
W have finite means (average limiting overshoots), 



poo poo 

K{ = xl-Li(dx), Kq = / xT-L (dx) 
Jo Jo 



and we have the following asymptotic approximations for the expected sample sizes of 
the SPRT 5 i as a, f3 ->■ so that a| log /3| + /5| log a| ->■ 0: 

Ei [5*] = - (| log a\ + m + log 7i ) + o(l), (2.5) 

-n 

E [5i = i (| log + 4 + log 7*) + o(l). (2.6) 
2.2. MiLRT and WGLRT 

We will say that q = (q 1 , . . . , q K ) is a weight, if > V 1 < i < K. For any weight 
q, we set |q| = Yli=x Q l an( * we define 

K 

At(q) = £<f A *' A '(q) = A ^ ^ 2 - 7 ) 

i=l 

Z t (q) = log A t (q), Z t (q) = logA t (q). (2.8) 

The emphasis of this paper is on the MiLRT, 5 mi = (M, du), and the WGLRT, <5 g i = 
(N,d]y), which are parametrized by two arbitrary weights qo, qi and are defined as 
follows: 

M = inf{i : A t (qi) > B or A t (q ) < A" 1 }, d M = l{A M ( qi )>B}, 
N = inf{t : A t ( qi ) >Bor A t (q ) < A" 1 }, d N = l {AiV ( qi )>B } - 
Alternatively, if we introduce the following one-sided stopping times 

Ml = inf{t : A t (qi) > b}, M a = infjt : A t (q ) < A' 1 ], 
N X B = inf{t : A t ( qi ) > b}, N° a = inf{t : A t (q ) < A' 1 }, 
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S m i and <5 g i can be denned as follows 



M = min{Mi M l B }, d M = l {M i < A/ o } 
N = mm{N% N B }, d N = l {JV i <^o } . 



(2.10) 



(2.9) 



We also define the associated overshoots 



f] = [Z A /(qi) - log 5] l{d M =l} - [^m(Qo) + log A] l{d M =o}, 
fj = [Ziv(qi) - logS] l{d JV =i} - [^Ar(qo) + log^l] 1^=0}, 



(2.11) 



(2.12) 



which play an important role in the asymptotic analysis of the operating characteristics 
of the two tests. 

3. Asymptotic Approximations for the Operating Characteristics 

In this section, we obtain asymptotic inequalities and approximations for the error prob- 
abilities and expected sample sizes of the MiLRT and the WGLRT. In order to do so, 
we rely on the following decompositions for Z(q) and Z(q), which hold for every 
1 < i < K and any weight q = (q 1 , . . . , q K ), 



From the Strong Law of Large Numbers (SLLN) it follows that, for every j ^ i, 
Pi(Al/A\ — > 0) = 1. This implies that Y l (q) and Y l (q) also converge to Pj-a.s., 
and consequently, they are slowly changing under P, (for a precise definition of "slowly 
changing" we refer to Siegmund (1985), page 190). Since Z\ is a random walk under 
Pi, from this observation and decompositions (3. 1)— (3.2) it follows that Z(q) and Z(q) 
are perturbed random walks under Pj. 

Similarly, the SLLN implies that, in the special case where r = 1, Po(A^/A^ — > 
0) = 1 for every j > 1. Therefore, Y 1 (q) and Y 1 (q) also converge to Po-a.s. and 



Z t (q) = ^ + log<f +y/( q ), ten, 

Z t ( q ) = Zi + logq i +Yj(< l ), ten, 



(3.1) 



(3.2) 




(3.3) 



(3.4) 
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from (3.1)— (3.2) with i = 1 it follows that Z(q) and Z(q) are perturbed random walks 
under Po when r = 1. 

These properties allow us to apply nonlinear renewal theory for perturbed random 
walks (see Woodroofe (1976, 1982), Lai and Siegmund (1977, 1979), Siegmund (1985)) 
in order to obtain asymptotic approximations for the expected sample sizes of the tests 
8 m i and <5 g i under Pj for every 1 < i < K, as well as under Po when r = 1. An 
asymptotic approximation for Eo [N] when r > 1 can be obtained based on the nonlinear 
renewal theory of Zhang (1988) using the following representation for N A : 

N° A = infjt : £° t > log A + max (log +4)), (3.5) 

where t 3 is the log-likelihood process under Pj for j = 0, 1, . . . , K, i.e., 

t 

4 = J2 l ^fj(X n ), ten. (3.6) 

n=l 

For the latter approximation we also need some additional notation. Specifically, for any 
1 < i < K, we set ^ = E [log/i(Xi)], so that P Q = E [log fo(Xi)] - fa. Moreover, 
we set (j, = maxKKK so that I = E [log f (Xi)] — \i, we define the r-dimensional 
random vector 

W = (log h{X{) -fi,... , log / r (Xi) - fi), (3.7) 
and we denote by S its covariance matrix under Po- Finally, we set 

d r = — h r = (max Xi) <j>T.{x) dx, (3.8) 
2\//o iM- 1 ^ i ^ r 

where ^£ is the density of an r-dimensional, zero-mean, Gaussian random vector with 

covariance matrix S. 

3.1. Asymptotic bounds for the error probabilities 

We start with the following lemma. 

Lemma 1. For any 1 < i < K, 

Eife-" l {dM =i } ] -> 7*. E 4 [e"^ l {(ijv=1} ] as A, J3->oo. (3.9) 
If additionally r = 1, 

Eo[e-" % M=0} ] -)• 7o, Eofe^lli^o^To 1 asA,B^oo. (3.10) 
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Proof. We will only prove the first assertions in (3.9) and (3.10), since the other ones 
can be proven in an identical way. 

Since M = M B = ml{t : Z t (qi) > log B} and 77 = Z M i (qi) - log B on {d M = 
1} = {Mg < M%}, and {Z t (qi) = Z\ + log q\ + l?(qi)} is a perturbed random walk 
under P.;, from nonlinear renewal theoiy (see, e.g., Theorem 9.12 in Siegmund (1985)) 
it follows that 77 converges in distribution to Hi under Pj on {<Im = 1}- Therefore, the 
Bounded Convergence Theorem yields Ej[e _T? l{<f M =i}] — > 7i- 

Since M = M\ = inf{i : -Z t (q ) > log A} and 77 = |Z M o(q ) + log A\ on 
{d M = 0} = {M B > A/0}, and {-Z t (q ) = -Z, 1 - log?j - *?(<*))} is a perturbed 
random walk under Po when r = 1, the same argument as above applies to show that 

Eofe^l^o^To 1 - □ 

The following theorem provides exact and asymptotic upper bounds on the error 
probabilities of 5 mi and 5 g \. 

Theorem 1. (a) For any A,B>1, 

Po(d M = l)<^-, Po(d N = l)<^, (3.11) 

Pi(d M = 0) < -L , Pi(d N = 0) < -L, 1 < i < K. (3.12) 
Aql Aql 

(b) As A, B -)• 00, 

1 K 

P (d M = 1) = ^ (S«i7i) (l + o(l)), (3.13) 
i=i 

1 K 

Po(diV = 1) < o (E^V) (! + °( 1 ))- ( 3 " 14 ) 

i=i 

If additionally r = 1, then for every 1 <i < K 

Pi(d M = 0)<^-(l + o(l)), P<(dj V = 0)<^-(l + o(l)). (3.15) 

Proof. Let us define the probability measure P qi = l{ Pi an( l denote by E qi 

expectation with respect to P qi . Since 

K 



dP qi 



dP 



n |qil ^1 l^ 1 
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changing the measure Po *->■ P qi we have 

K 

P (d M = 1) = | qi | E^[e- ZM(qi) l {dM=1} ] = Y,q\ Ei[e-^ (qi) \ dM =i } ]- (3.16) 

i=l 

Since ^a/(<1i) = log B + -qon {(1m = 1}, we obtain 

1 * 

Po(d M = 1) ^^^E^^j]. (3.17) 

i=i 

Since 77 is positive, the first inequality in (3. 11) immediately follows from (3.17), whereas 
(3.13) follows from (3.9). A similar argument as the one that led to (3.16), along with 
the fact that Z t (q,i) > ^t(qi), yields 

K 

Po(d N = 1) = J2<A Ei[e- Z »™ t {dN=1} ] (3.18) 
i=i 

i=l i=l 

The last inequality and the fact that fj is positive imply the second inequality in (3.11), 
whereas (3.14) follows from (3.9). 

Finally, changing the measure P, \— > Po, we obtain 

Pi(d M = 0) = E [e Z M l {dM=0} }. (3.19) 

Since Z' l M = Z M (q.o) ~ log^o ~ ^(qo) (recall (3.1)), Z M (qo) = -log ,4 - 77 on 
{d M = 0} (recall (2.11)) and Y^(q ) > 0, it follows that Z l M < - log(Agj) - r? on 
{d,M = 0} and, consequently, (3. 19) becomes 

P l (d M =0)<^E [e^t {dM=0} }. 

Since r\ is positive, we obtain the first inequality in (3.12), whereas from (3.10) we 
obtain the first inequality in (3.15). The remaining inequalities in (3.12) and (3.15) can 
be shown in a similar way. □ 



From Theorem 1(a) it is clear that when A, B are selected according to 
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then 5 m i, 5 g \ S C Qj/ g. Moreover, from Theorem 1(b) it follows that we can obtain sharper 
inequalities if we correct for the overshoots selecting A, B as follows 

M<io) = -—^ BM= Z >= iqhj . (3.21) 

p num<i<K % a 

Indeed, with this selection of the thresholds we have Pq((1m = 1) = ct(l + o(l)), 
Po(^Af = 1) < ot{\ + o(l)) and if additionally r = 1, maxi<j<j<- P«(ciM = 0) < 
/3(1 + o(l)) and maxi<i<x Pi(djv = 0) < /5(1 + o(l)). 

3.2. Asymptotic approximations to expected sample sizes 

In order to obtain asymptotic approximations to the expected sample sizes of the MiLRT 
and the WGLRT, we will make the following assumptions, which will be needed for all 
the results in the rest of the paper: 

(Al) Ei[(Z{) 2 ] < oo and E [(Z|) 2 ] < oo, 1 < i < K; 

(A2) a, (3 -f so that | log a\/\ log /3\ k, where k G (0, oo); 

(A3) For T = M or T = N, A and B are selected so that as a, /3 ->• 

fc a(l + o(l)) < P (dT = 1) < a (1 + o(l)), (3.22) 

fci /3 (1 + o(l)) < max Pi(d T = 0) < /3 (1 + o(l)), (3.23) 

or equivalently, 

|loga| +o(l) < | log P (c?t = 1)| < |loga| + | log k \ + o(l), (3.24) 

| log 0| +o(l) < I log max Pi(d T = 0)| < | log/3| + | log + o(l), (3.25) 

l<i<_f!r 

where fco> fei G (0, 1) are fixed constants, not necessarily the same for 8 m i and 5 g \. 

The second moment conditions (Al) on the log-likelihood ratio Z\ are required 
even for the asymptotic approximations (2.5)-(2.6) to the performance of the SPRT for 
testing fo against /j. Assumption (A2) concerns the relative rates with which a and 
P go to and requires that a should not go to exponentially faster than j3 and vice- 
versa. Note, however, that a can still be much smaller than j3 (or vice versa), a natural 
requirement in many applications. Assumption (A3) requires that the thresholds for both 
the MiLRT and the WGLRT are designed so that the probabilities of the type-I and type- 
II errors are asymptotically bounded by (and at the same time not much smaller than) 
a and (3 respectively. As the following lemma suggests, (A3) connects the thresholds 
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A and B with the desired error probabilities a and /3, so that we do not need to impose 
additional (to (A2)) constraints to the relative rates with which A and B go to infinity. 

Lemma 2. 7/(A3) holds, then \ogB = \ log a\ + 0(1) and log A = | log f3\ + 0(1). 

Proof. From (3. 11) we know that log B < |logPo(c?M = l)| + |qi|, whereas from (A3), 
and in particular (3.24), it follows that | log Po(^a/ = 1)1 < I bga| + | log A; 1 + o(l), 
which proves log B = | log a\ +0(1). The second relationship can be shown in a similar 
way. □ 

Theorem 2. If conditions (A1)-(A3) hold, then 
(a) for every 1 < i < K, 

h Ej [M] =\ogB + m - log q{ + o(l), (3.26) 
h Ej [N] = log B -\- Ki loggi + o(l); (3.27) 



(b)for r = 1, 



(c)for r > 1, 



Jo E [M] = log A + 4 + log^ + o(l), (3.28) 
h E [iV] = bg A + /ej + bggd + o(l); (3.29) 

I E [M] =\ogA + 2d r v / biA + 0(l), (3.30) 



I E [iV] = log A + 2d r y / hgA + 0(l), (3.31) 
where d r is defined in (3.8). 

Proof, (a) Asymptotic approximations (3.26) and (3.27) can be relatively easily estab- 
lished using nonlinear renewal theoiy. Specifically, starting from representation (3.1) 
and applying the Nonlinear Renewal Theorem (see Theorem 9.28 in Siegmund (1985)), 
it can be shown (as in Theorem 2. 1 of Fellouris and Tartakovsky (2012)) that 7j Ej [Mg] 
is equal to the right-hand side of (3.26) as B — > oo. Therefore, to prove (3.26) it suffices 
to show that Ej[A/ B — M] = o(l) as A, B — > oo, or equivalently as a, f3 — > 0. To this 
end, note that 

< Mh - M = [Ml - M° A ] t {dM=0} < Ml t {dM=0} . 
Applying the Cauchy-Schwartz inequality, we obtain 



E,[M B l {dM=0} ] < JEip4 )2] Pi(d M = 0). (3.32) 
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From (3.1) and (3.3) it is clear that Z t (q.i) > Z\ + log q{, t G N, thus, 

M\ < M{t : Z\ > \og(B/q\)}. 

Consequently, from Theorem 8.1 in Gut (2008) it follows that, since (Al) holds, 

(irfEmhft^MB/qDfil + oil)). 

From the latter inequality and Lemma 2 we conclude that 

E l [{M l B ) 2 ] = 0{(\ogB) 2 ) = 0{\\oga\ 2 ). 

Moreover, since (A3) implies Pi{du = 0) < /3(1 + o(l)), (3.32) becomes 

E l [M 1 B l {dM=oy ]=0(\loga\ 2 f3) 

and from (A2) we conclude that the upper bound goes to 0. This completes the proof of 
(3.26), whereas the proof of (3.27) is analogous. 

(b) From representation (3.1) and the Nonlinear Renewal Theorem it follows that 
Io EofAf^] is equal to the right-hand side of (3.28) as A — > oo. Then, similarly to (a), 
we can show that EofAf^ — M] = o(l). The proof of (3.29) follows similar steps. 

(c) In order to prove (3.31), we start from representation (3.5) and apply nonlinear 
renewal theory of Zhang (1988). As a result, it can be shown (analogously to Lemma 2.1 
of Dragalin (1999)) that I E [N%\ is equal to the right-hand side of (3.31). Thus, it 
suffices to show that Eo[A^^] = Eo[-/V] + o(l), which can be done in just the same way 
as in (a) and (b). □ 

Remark 1. Asymptotic approximation (3.31) can be further improved (up to the neg- 
ligible term o(l)), if stronger integrability conditions are postulated on the vector W 
defined in (3.7). Specifically, if in addition we assume the third moment condition 

EoHlV^H 3 ] < oo as well as the Cramer-type condition limsupii t ||_ > . 00 Eo[e- J< * ,H/> ] < 1, 
where j is the imaginary unit, t = (t±,...,t r ) and < t,W >= Xw=i *|W|> then the 
following expansion holds 

I E [N] =\ogA + 2d T VlogA + d, 2 + 7^ + 4 

+ / \max(xi) [P(x) + A(q ) £~V] 1 fa(x) dx + o(l), 
Jw {}■<*<? J 

where A(qo) = (log^Q, . . . , logg^) and V is a third-degree polynomial whose coef- 
ficients depend on the Po-cumulants of W (see Bhattacharya and Rao (1986)). This 
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approximation can be derived similarly to Theorem 3.3 of Dragalin et al. (2000) based 
on nonlinear renewal theory of Zhang (1988). 

Corollary 1. Suppose that (A1)-(A3) hold with Uq = 1, i.e., A and B are selected so 
that Po(dM = 1) ~ ol and Po(d-N = 1) ~ a. Then, 

K 

h Ej [M] = | loga| + log(^^ 7j ) +Ki- logq\ + o(l), (3.33) 

K 

h Ei[N] < | loga| + log(^^ 7i ) + m - \ogq\ + o(l). (3.34) 

j'=i 

Proof. From (3.1 3)-(3. 14) it follows that 

K 

log B = | log P (d M = 1)| + log (J] «j 7 i) + o(l), 

logB< |logP (d JV = l)| + log(2^7 i ) +o(l). 

Moreover, from (3.24) and the assumption that A;o = 1 we have 

|logP (tZju = 1)1 = | log a | + o(l) and |logP (<Zjv = 1)| = |loga| +o(l). 
From these two relationships and Theorem 2(a) we obtain the desired result. □ 

4. Asymptotic Optimality Properties 

In this section, we establish the asymptotic optimality properties of the MiLRT and the 
WGLRT. 

4.1. Uniform asymptotic optimality 

First, we show that both tests minimize the expected sample size within an O(l) term 
(i.e., to second order) under every Pj, 1 < i < K and at least to first order under Po- 

Theorem 3. Suppose that conditions (A1)-(A3) hold and that A, B are selected so that 

(a) For every 1 < % < K, 

Ei[M] = inf Ei[T\ + 0(l), (4.1) 

Ei[N}= inf Ei[T] + 0(l). (4.2) 
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(b) Ifr = 1, then 

Eo[M]= inf E [T] + O(l), (4.3) 
E [iV]= inf E [T] + O(l), (4.4) 



whereas ifr > 1, 



E [M]= inf E [T] (l + o(l)), (4.5) 
E [AH = inf E [T] (l + o(l)). (4.6) 



Proof, (a) From (2.5) it is clear that 



Ji inf Ej [T] > | log q| +0(1), (4.7) 

oec a « 



whereas from Theorem 2(a) and Lemma 2 it follows that 

7iEi[M] = log£ + 0(l) = |loga| +0(1), 

which proves (4.1). The proof of (4.2) is similar. 

(b) From (2.6) it is clear that for every 1 < i < K we have 



, log B\ 

inf E [T]>^^ + O(l), (4.8) 



thus, recalling from (2.4) that Iq = mini<j<x Iq, we obtain 

inf E [T]>^^ + O(l). (4.9) 

But from Theorem 2(b) and Lemma 2 it follows that 

( log A + 0(1) = | log 0| + 0(1), lfr = l, 
\ log .4 (1+0(1)) = | log/3| (1 + 0(1)), ifr>l, 

which implies (4.3) and (4.5). The proofs of (4.4) and (4.6) are similar. □ 
4.2. Almost optimality 

In what follows, we denote by ^(p) = (M*(p), d M * (p) ) and ^(p) = (Ar*(p),djv*( P )) 
the MiLRT and the WGLRT with weights given by (1.6), i.e. 

q\ = ^- and 9 «= K £i, 1 < % < K, (4.11) 

where p = (pi, . . . ,pk), Pi > for every 1 < i < K and YliLiPi = 1- O ur 
goal is to show that ^(p) and ^gi(p) attain inf^ e c a ? E P [T] asymptotically within an 
o(l) term, where E p is expectation with respect to the weighted probability measure 
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P p = X^j=i Pi Pi- Before doing so, note that Corollary 1 implies that if B is selected so 
that Po(^M*(p) = 1) ~ a and Po(^7v*( P ) = 1) ~ a, then 



E i [M*(p)} = - |loga| + Ki + log 7i + C7 i (p) + o(l), (4.12) 

ij L 

Ei[JV*(p)] < y[|loga| +«» + log 7i + a(p)l +o(l), (4.13) 

ij L 

where we have used the fact that d = jilu 1 < i < if and we have introduced the 
following notation 



C l (p) = log(^^j-l g^, l<i<K. 

3=1 3 * 



Pi 



(4.14) 



Theorem 4. Suppose that conditions (A1)-(A3) hold with k = 1, a, /3 — >• so ?/za? 
| log a | ~ | log /3|. 77ze?i 



K 

inf EP[T] = y;^[|loga| + « i + log7i + C i (p)l +o(l). 

2=1 



(4.15) 



Moreover, if A, B are selected so that ^(p) a«J 5*i(p) belong to C a ^ and ftQ = 1, i.e., 
Po(djw*( p ) = 1) ~ a a«<i Po(d/v*( P ) = 1) ~ a, ?/ze« 

inf EP[T] = EP[M*(p)]+o(l), 
inf EP[T] = EP[iV*(p)] + o(l). 

In order to prove this theorem, we formulate our sequential testing problem as a 
Bayesian sequential decision problem with K + 1 states, Ho : / = /o and H| : / = /j, 
1 < i < if and two possible actions upon stopping, either accepting Ho or Hi = Ujfl|. 
Moreover, we denote by c the sampling cost per observation and by w\ (resp. wq) the 
loss associated with accepting Ho (resp. Hi) when the correct hypothesis is Hi (resp. 
Ho). We also define the probability measure P 71 " = 7r Po + (1 — it) P p , which means that 
7T = P 7r (Ho) is the prior probability of Ho and pi = P 7r (H* 1 |Hi) is the prior probability 
of / = fi given that H i is correct. 

The integrated risk of a sequential test 5 = (T, drp) is defined as the sum 1Z(S) = 
1Z C (T) + TZ s (dT), where 1Z C (T) is the integrated risk due to sampling and TZ s (dr) is 
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the integrated risk due to a wrong decision upon stopping, i.e., 

TZ C (T) = cW[T\ = c[ttE [T} + (1 -tt) E*[T] , 

TZ s {d T ) = W[w Q t {dT=l] \H ] + W[ Wl l {rfT=0} |Hi] 

= ttw P (d T = 1) + (1 - tt)wx P p (d T = 0). 

The Bayesian sequential decision problem is to find an optimal (Bayes) sequential test 
that attains the Bayes risk, 71* = mis 1Z{6). It is well known that the solution to this 
problem does not have a simple structure (see, e.g., Chow etal. (1971)). However, from 
the seminal work of Lorden (1977) on finite-state sequential decision making it follows 
that ^(p) and <5*j (p) are almost Bayes when the thresholds A and B are chosen as 

. 1 — 7T Wl t _ TT Wo 

A c = and B c = -. (4.16) 

7T C 1 — 7T C 

More specifically, denote by <% i c (p) = (M*(p), d M;{p) ) and 5* l c (p) = (iV*(p), d N * (p) ) 
the sequential tests 5j^i(p) anc ^ ^gi(p) wnen the thresholds are given by A c and B c . Un- 
der the integrability condition (Al), it follows from Lorden (1977) that 

ll(6* mi>c (p))-ll* = o(c) and ft(£* 1)C (p)) - TV = o(c). (4.17) 

The proof of Theorem 4 relies on this third-order Bayesian asymptotic optimality prop- 
erty, which requires symmetric thresholds (4.16) and is the reason why we assumed in 
Theorem 4 that error probabilities go to with the same rate. 

Proof. In order to lighten the notation, we omit the dependence on the prior distribution 
p and write simply 5^ = (M*,dM*) and <^ ic = (M*,d M *) instead of ^(p) = 
(M*(p), d MHp) ) and ^ i c (p) = (M*(p), d M * (p) ) (and similarly for the WGLRT). 

From Corollary 1 it is clear that the right-hand side in (4.15) is attained by 5^ and 
<5*j when their thresholds are selected so that Po(dM* = 1) ~ a and Pq^n* = 1) ~ a. 
If additionally ^i'^gi G Ca,p, then misec a/3 E P [T] is attained by these two tests to 
within an o(l) term. Thus, it suffices to establish (4.15). 

Consider the class of sequential tests 

C*f, = {5:Po(dr = l)<a and P p (d T = 0)<p}. 

Since C a ^ C C p /3 , we have mis^c a E P [T] > inf 5gC P^ E P [T]. Thus, it suffices to 
show that 

K 

inf EP[^=y)^[|bga|+« i +log7 i + a(p)l+o(l). (4.18) 
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Consider now the sequential test 6^ c = (M* , du* ) with thresholds A c and B c selected 
so that Po(d,M* = 1) = a and P p (dj\/* = 0) = f3. From Corollary 1 it is clear that 
E P [M*] is equal to the right-hand side in (4.18) as c — > 0, which means that it suffices 
to show that 

inf E P [T1 = E P [M*1 + o(l), 

where o(l) is an asymptotically negligible term as c — > 0. More specifically, if 5 is an 
arbitrary sequential test in C p ^, we need to show that, for sufficiently small c, |E P [T] — 
E P [M*]| is bounded above by an arbitrarily small, but fixed number. 
First of all, we observe that 

K s (d T ) = irw P {d T = 1) + (1 - tt) w x P p (d T = 0) 

<« a + (l-i)™^ = n,(d M *), (4.19) 

where the inequality is due to 5 G C p fl and the second equality follows from the as- 
sumption that P (d M * = I) = a and P P (M* = 0) = /3. 

From (3.1 1)— (3. 12) and the definition of A c and _B C in (4.16) we have 

n s {d M *) = ttw P {d M * = 1) + (1 - tt) ioi P p ((i Mc » = 0) 

, , ^ 1 

< vr-wo + (1 - tt)^i y^K-r^ 

<|qi|(l-7r)c + ^pi™<(Q-l)c, (4.20) 
i=X q o 

where Q > 1 is some constant that does not depend on c or tt. 
Fix e > and introduce the following sequential test 

T ec = min{M* c , T} , d Ttc = d T l {T < M , c} + d M tJ-{T>M; c }- 

Obviously, 

K s (drJ < Tz s (d T )+n s {d M * c ) < n s (d M *)+n s (d M * c ) 

<TZ s (d M *) + (Q-l)ce, (4.21) 

where the first inequality is due to (4. 19) and the second one is due to (4.20). 
Since M* is almost Bayes (recall (4.17)), for all sufficiently small c 

K C (M*) + K s (d M *) < TZ C (T 6C ) + K s (d T J + ce. (4.22) 
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Then, from (4.21) we obtain 1Z C {M*) < lZ c (T ec ) + Q ce, and consequently, 

vr E [M*] + (1 - vr) EP[M c *] < vr E [T ec ] + (1 - vr) EP[T ec ] + Q e 

< TT E [M* c ] + (1 - vr) EP [T] + Q e, (4.23) 

where the second inequality follows from the definition of T ec . Rearranging terms, we 
obtain from (4.23) that 

EP[M*] - EP[T] < (e [M* c ] - E [M* C ]) + (4.24) 

Since the last inequality holds for any it £ (0, 1), we can set it = e/(l + e), which 
implies £? c = ewo/c and ^4 C = w\/{ec), whereas (4.24) becomes 

EP[M*] - EP[T] < e (E [M e * c ] - E [M*]) +Qe(l + e). (4.25) 

But from (3.28) and (3.31) it follows that as c -)• 

Jo (E [M e * c ] - E [M*]) = 0(log^ ec - log A c ) 

and from (4.16) we have log^4 ec — log A c = | loge| + 0(1) as c — > 0, which completes 
the proof. □ 

Remark 2. With a similar argument as the one used in the proof of Theorem 4 it can be 
shown that if Po(^a/*( p ) = 1) = a and P p (^m*(p) = 0) = P> tnen 

inf E [T]> inf E [T] = E [M*(p)] + o(l) 

<5ec Qi/3 <5gCP 

and similaily for <5gj. However, the right-hand side in this asymptotic lower bound is 
generally not attained by £ mi (p) or (Jg^p) when their thresholds are selected so that <5 m ;, 

Remark 3. While we have no rigorous proof, we strongly believe that the assertions of 
Theorem 4 (as well as of Theorem 5 below) hold true in the more general case where 
a and j3 approach zero in such a way that the ratio log a / log /3 is bounded away from 
zero and infinity, which allows one to cover the asymptotically asymmetric case as well. 

4.3. Almost minimaxity 

For any stopping time T and 1 < i < K , we set 2j[T] = ijEj[T]. Without loss 
of generality, we restrict ourselves to Pj-integrable stopping times, thus, from Wald's 
identity it follows that 

, . _ r dPi l 
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In other words, Ti[T] is the expected KL divergence between P.; and Po that is accumu- 
lated up to time T. Let p = (p\, . . . ,§k) denote the prior distribution for which 

ft= T K teK £e KJ , 1<*<K. (4.26) 

Then, from (3.26)-(3.27) it follows that p (almost) equalizes the KL-divergence that 
is accumulated by both the MiLRT and the WGLRT until stopping, in the sense that 
Zi[M*(p)] and Zj[iV*(p)] are independent of i up to an o(l) term. Indeed, 

K 

li[M* '(p)] = log B + log^e^) +o(l), (4.27) 

A' 

2i[JV*(p)]=logB + bg(j^^^) +o(l), (4.28) 

i=i 

where only negligible terms o(l) may depend on i. If additionally B is selected so that 
P(^M*(p) = 1) ~ a and P(g?at*(p) = 1) ~ a, then (3.33)-(3.34) imply that for every 

1 < i < K, 

K 

Ii[M*(P)] = | loga| + log(]T7;e Kj ) + o(l), (4.29) 

i=i 

Xi[iV*(p)] < | loga| +log(^ 7je ^) +o(l), (4.30) 

i=i 

and consequently, if we denote by Z[T] = maxi<j<^ Ii[T] the maximal expected KL- 
divergence until stopping, we have 

K 

J[M*(p)] = |loga| +log(^ 7j e^) +o(l), (4.31) 

J[iV*(p)] < |loga|+log(^7 i e^) +o(l). (4.32) 

i=i 

The following theorem states that <5 m i(p) and 5 g i(p) are almost minimax in this 
KL-sense. 

Theorem 5. Suppose that conditions (A1)-(A3) hold with k = 1, a, (3 — > so that 
| log a | ~ | log f3\. Then, 

K 

inf X[T] = | loga| + log(V jje K A + o(l). (4.33) 
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If additionally A, B are selected so that <5 m i(p), 5 g i(p) £ C a> p and ko = 1, i.e., 
P(^M*(p) = 1) ~ a awrf P(^iv*(p) = 1) ~ a, 

inf Z[T] = ±[M*(p)] + o(l), (4.34) 

inf X[T] =X[iV*(p)] +o(l). (4.35) 

Proof. Suppose that thresholds A and B are selected so that <5 m i(p) £ C Q)i a and P(d M *( p ) 
1) ~ a. From Theorem 4 it follows that 

X K K „ 

X>E^(P)] + °(1)< inf ^p 4 E,[r]= inf ^^X[T] 

t=\ i=i 1=1 

whereas from (4.29) and (4.31) we have 

it if . is: 



£>E,[M*(p)] = [| loga| + log(£ 7 ^) + o(l) 

i=l i=l * j=l 

K A 
p 



(4.37) 



£|)i[M*(p)]. (4.38) 

t=i 1 

From (4.36) and (4.37) we obtain (4.33), whereas from (4.36) and (4.38) we obtain 
(4.34). Finally, from (4.32) and (4.33) we obtain (4.35). □ 

5. How to Select p? 

In this section, we consider the specification of the prior distribution p, which deter- 
mines the weights qo and qi of the MiLRT and the WGLRT when the weights are 
selected according to (1.6). Our goal is to select a robust prior, which inflicts a small 
performance loss under every scenario. In other words, we want to avoid a prior distri- 
bution that leads to sequential tests with very good behavior for some densities in Hi, 
but with poor behavior for others. 

5.1. Performance measures 

We will quantify the "performance loss" of the MiLRT (and similarly for the WGLRT) 

under Pj by the following measure, 

E t [M-(p)]-E,[y] . 
Ji(p) = ^ , l<^<K, 
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where we recall that S l is the SPRT for testing /q against /j. That is, J7i(p) represents 
the additional expected sample size due to the uncertainty in the alternative hypothesis 
divided by the smallest possible expected sample size that is required for testing /o 
against f\. Moreover, if S l has error probabilities a and (3, assumptions (A1)-(A3) hold 
and k = 1, then from (2.5) and (4.12) it follows that 



Ji(p) 



Cl(p) 



log 



+ log Ii - log Pi 



(5.1) 



log a\ + Ki + log 7i | log a| + Kj + log 7, 

where by « we mean that the two sides differ by an o(l) term. From this expression we 
can see that the magnitude of Ji{p) is mainly determined by K, the cardinality of A\, 
and the probability of type-I error a. In particular, for every 1 < % < K and p, Ji{p) 
will be "small" when | log a\ is much larger than log K, which implies that the choice 
of p may make a difference only when | log a\ is not much larger than log K. 

Table 1 . Asymptotic performance loss for different prior distributions 



Pi 


q\ 


ate) 




1 


-log( 7 i) + log(£jLi7;) 




l/7i 


logiC 


e Ki Jd 


e Ki 


-log( 74 e^) + log(Ef =1 7i^) 


1 




log(J i ) + log(Ef =1 (l/J i )) 



Moreover, from (5.1) it is clear that a good choice for p would guarantee that Cj(p) 
is "small" for eveiy 1 < i < K. In Table 1, we present C;(p) for the almost least 
favorable distribution p, defined in (4.26), as well as for some other intuitively appealing 
choices of p. In particular, we consider the priors p 1 , p c , p u which are defined so that 

pj oc Ii, p^ (xd, pf oc 1, 1 < i < K. 

Note that p c , p x , p are ranked, in the sense that C L < Ii < e Kt Ci, since d = 7jJj 
and 7j < 1 < e Kl 7j. Thus, p c (resp. p) assigns relatively less (resp. more) weight 
than p x to a hypothesis as its "signal-to-noise ratio" increases. Note also that p c and 
p reduce to p x when there is no overshoot effect, in which case «j = and % = 1, 
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whereas all these three priors reduce to p u in the symmetric case where Jj and 7ii do 
not depend on i. 

5.2. Numerical comparisons 

In order to make some concrete comparisons, we focus on the multichannel setup (1.5), 
assuming that {g l , g\} can be embedded in a parametric family g(x; 9), so that 

gl(x) = g(x;9 = 0) and g\{x) = g(x; 9 t ), 1 < i < K , (5.2) 

where 9{ > expresses the "signal-to-noise ratio" in channel i, 1 < i < K. 
Consider the exponential model assuming that 

g{x-9) = -^—e- x ^ 1+e \ x>0. (5.3) 
l + p 

Then Ii, and % take the following form 

h = 9i - log(l + 9i), Ki = 9 h ll = {l + 9, i )~ 1 . 

For the Gaussian model g(x) = Af(x; 9, 1), where J\f(x; /U, a) is density of the normal 
distribution with mean fi and standard deviation a, the above quantities become 

1 n=l 
n=l v 

Assume, for simplicity, that 4 = 4 for 1 < i < K/2 and 6*j = 9 for if/2 < i < K. 
Thus, the "expected" signal in the first (resp. last) channel is stronger (resp. weaker) 
than the signal in the last (resp. first) channel when 9 < 4 (resp. 9 > 4). 

Our goal is to evaluate i7i(p) and Jk{v)> i- e -> tne inflicted performance loss when 
signal is present in the first and last channel respectively, as a function of 9, for different 
prior distributions. We do so using asymptotic approximation (5.1), in which we have set 
K = 10 and a = 10~ 4 , and we present the results for the exponential case in Figure 1 
and for the Gaussian case in Figure 2. 

The plots in both figures show that setting p = p (resp. p = p u ) leads to a better 
performance when signal is present in the channel with stronger (resp. weaker) signal- 
to-noise ratio. However, the inflicted performance loss when the signal is present in 
the other channel can be very high. On the other hand, setting p = p 1 or p = p^ 
leads to a more robust performance, since the performance loss is similar (and relatively 
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small) irrespectively of the channel in which signal is present and of the relative signal 
strengths. 



First Channel (8 = 4) 



Second Channel (9-x) 




- - p 

— P 1 

P L 

.... p« 




FIGURE 1. Performance loss for different prior distributions in a mul- 
tichannel problem with exponential data. 




FIGURE 2. Performance loss for different prior distributions in a mul- 
tichannel problem with Gaussian data. 
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6. Monte Carlo Simulations 

In this section, we present a simulation study whose goal is to check the accuracy of the 
asymptotic approximations established in Section 4 and to compare the MiLRT with the 
WGLRT for realistic probabilities of errors. In particular, we consider the multichannel 
setup (1.5) with K = 3 channels, exponential distributions given by (5.2)-(5.3) and 
parameter values selected according to Table 2. Since our main emphasis is on the fast 
detection of signal, we set /3 = 10~ 2 and consider different values of a. Moreover, 
we choose the thresholds A and B according to (3.21), whereas we select the weights 
according to (1.6) with p = p x . 

TABLE 2. Parameter values in a multichannel problem with exponen- 
tial data 



0i 


h 




7i 


4 


Qo 


0.5 


0.095 


0.5 


0.67 


0.308 


0.013 


1 


0.584 


1 


0.4 


0.837 


0.078 


2 


0.901 


2 


0.33 


1.380 


0.138 



In the first three columns of Table 3 we compare the type-I error probabilities for the 
two tests, which have been computed based on simulation experiments, against the target 
level a. More specifically, these error probabilities are computed using representations 
(3.16) and (3.18) and importance sampling, a simulation technique whose application in 
Sequential Analysis goes back to Siegmund (1976). These results indicate that selecting 
B according to (3.20) leads to type-I error probabilities very close to a for both tests, 
even for relatively large a. In particular, we see that Po(^m* = 1) is slightly larger 
than a, which is expected, since (3.20) implies Pq{(1m* = 1) ~ a, whereas we also 
observe that a is a sharp upper bound for Po(g?at* = 1), the type-I error probability of 
the WGLRT. 

In the remaining columns of Table 3, we present for both tests the (simulated) ex- 
pected sample size under Pj, i = 1, 2, 3 and in Figure 6 we plot these values against 
the corresponding (simulated) type-I error probabilities. In these graphs, we also super- 
impose asymptotic approximation (3.33) (dashed line), as well as the asymptotic per- 
formance of the corresponding SPRT, (2.5), which is given by the solid line. Triangles 
correspond to the WGLRT and circles to the MiLRT. From these results we can see, first 
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Table 3. Type-I error probabilities and the expected sample sizes un- 
der Pj, i = 1, 2, 3 for different values of the target probability a when 
13= lCr 2 . 



a 


P (d M ,=l) 
a 


p {d N *=i) 

a 


Ei[M*] 




E 2 [M*] 


E 2 [iV*] 


E 3 [M*] 


E 3 [N*] 


lO" 2 


1.051 


0.994 


59.9 


59.4 


17.8 


19.4 


6.2 


7.3 


i(r 3 


1.033 


0.995 


84.1 


84.1 


25.7 


27.1 


9.0 


9.9 


10" 4 


1.025 


0.996 


108.5 


108.3 


33.7 


34.6 


11.7 


12.4 


10" 5 


1.017 


0.996 


132.5 


132.3 


41.4 


42.0 


14.3 


15.0 



of all, that asymptotic approximation (3.33) is very accurate for both tests. Moreover, 
we can see that the two tests have similar performance. In particular, their performance 
is identical when signal is present in the channel with the smallest signal strength. In the 
other two cases, the MiLRT seems to perform slightly better, however the difference is 
small. 

7. Conclusion 

In this work, we performed a detailed analysis and optimization of weighted GLR and 
mixture-based sequential tests when the null hypothesis is simple and the alternative 
hypothesis is composite but discrete. Irrespectively of the choice of weights, both tests 
minimize asymptotically, at least to first order and often to second order, the expected 
sample size under each possible scenario as error probabilities go to 0. However, with 
appropriate selection of weights, both test achieve higher-order asymptotic optimality 
properties. Specifically, they minimize a weighted expected sample size as well as the 
expected Kullback-Leibler divergence in the least favorable scenario to within asymp- 
totically negligible terms as error probabilities go to zero. Moreover, based on simula- 
tion experiments, we can conclude that the two tests perform similarly even for not too 
small error probabilities. Finally, we believe that the proposed approach can be extended 
to sequential testing of multiple hypotheses, a substantially more complex problem that 
we plan to consider elsewhere. 
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Figure 3. Expected sample size of MiLRT and WGLRT under Pj 
against type-I error probability (in logaiithmic scale), i = 1,2, 3. The 
dashed line represents asymptotic approximation (3.33), whereas the 
solid line refers to (2.5), the asymptotic performance of the correspond- 
ing SPRT. The triangles (resp. circles) represent the simulated perfor- 
mance of the WGLRT (resp. MiLRT). 
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